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ABSTRACT 

Image  flow,  the  apparent  motion  of  brightness  patterns  on  the  image  plane,  can  provide  important  visual  informa- 
tion such  as  distance,  shape,  surface  orientation,  and  boundaries.  It  can  be  determined  by  either  feature  tracking  or 
spatio-temporal  analysis.  The  optical  flow  thus  determined  can  be  used  to  reconstruct  the  3-D  scene  by  determining  the 
depth  from  the  camera  of  every  point  in  the  scene.  However,  the  optical  flow  determined  by  either  of  the  methods 
mentioned  above  will  be  noisy.  As  a result,  it  may  be  difficult  to  use  the  depth  information  obtained  from  optical  flow 
in  practical  applications  such  as  image  segmentation,  3-D  reconstruction,  path  planning,  etc.  By  using  temporal 
integration,  we  can  increase  the  accuracy  of  both  the  optical  flow  and  the  depth  determined  from  optical  flow. 

In  this  work,  we  describe  an  incremental  integration  scheme  called  the  running  average  method  to  temporally 
integrate  the  image  flow.  We  integrate  the  depth  from  camera  obtained  using  optical  flow  determined  firom  gradient 
based  methods,  and  show  that  the  results  of  temporal  integration  are  much  more  useful  in  practical  applications  than 
the  results  from  local  edge  operators.  Finally,  we  consider  an  image  segmentation  example  and  show  the  advantages  of 
temporal  integration. 

1.  INTRODUCTION 

Visual  information  such  as  distance,  shape,  orientation,  and  structure  can  be  obtained  using  gradient  based 
methods  or  feature  based  methods.  Several  algorithms  have  been  proposed  to  determine  the  3-D  structure  of  the  scene 
using  local  operators.  The  results  obtained  using  local  operators  are  noisy,  and  have  a large  error  in  individual  measure- 
ments. Temporal  integration  is  a method  in  which  the  individual  values  determined  using  local  operators  are  integrated 
over  a period  of  time  to  obtain  consistent  values  for  the  quantity  being  determined.^-''  In  section  2,  we  briefly  describe 
two  methods  of  determining  the  optical  flow.  In  section  3,  we  describe  the  Epipolar  Plane  Image  (EPI),  and  define 
terms  used  in  subsequent  discussions.  We  present  the  running  average  method  of  temporal  integration  in  section  4.  In 
section  5,  we  show, how  3-D  reconstruction,  i.e.,  determination  of  depth  from  optical  flow,  is  possible.  Experiments 
and  results  follow  (section  6),  and  we  compare  the  results  of  temporal  integration  with  the  results  from  local  edge 
operators.  We  also  illustrate  how  image  segmentation  can  be  performed  successfully  using  the  results  of  temporal 
integration. 


2.  ESTIMATION  OF  OPTICAL  FLOW 

Image  flow  is  defined  as  the  apparent  motion  of  brightness  patterns  on  the  retina  of  the  eye  (or  on  the  image 
plane  of  a camera).^  It  can  be  determined  using  monocular  vision  (i.e.,  a single  camera)  using  the  following  methods: 

1.  Feature  matching  based  methods: 

In  these  methods,  identifiable  features  from  a sequence  of  images  are  extracted  and  correspondence  is  established. 
The  corresponding  features  are  used  to  calculate  a set  of  disparity  vectors  for  the  sequence.  Any  identifiable  entity  can 
be  used  as  a feature,  but  sharp,  localized  features  give  the  best  accuracy. 
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2.  Spatio-temporal  methods. 

These  methods  use  the  spatial  and  temporal  relationships  of  image  intensities  to  determine  the  optical  flow.  The 
time  history  of  the  position  of  a feature  in  the  image  can  be  represented  in  a 3-dimensional  coordinate  system  {x,y,t). 
This  is  often  called  a spatio-temporal  solid  or  spatio-temporal  volume.  The  spatio-temporal  area  is  defined  as  an  x-t 
slice  of  the  solid,  i.e.,  a slice  taken  with  y = constant.  The  slope  at  any  point  {x,t)  in  the  spatio-temporal  area  gives 
the  horizontal  component  of  its  velocity.  Therefore,  the  horizontal  component  of  optical  flow  can  be  obtained  by  deter- 
mining the  orientation  of  the  edges. 

3.  EPIPOLAR  PLANE  IMAGE 

As  discussed  above,  the  time  history  of  each  feature  point  in  the  image  gives  the  spatio-temporal  volume,  and  a 
slice  in  the  temporal  direction  gives  the  spatio-temporal  area.  Figure  1(a)  shows  a stationary  scene  containing  a rail- 
road car  in  the  background  and  a tree  in  the  foreground.  Assume  that  the  camera  moves  in  the  horizontal  direction,  that 
the  optical  axis  of  the  camera  points  perpendicular  to  the  direction  of  motion,  and  that  the  optical  flow  is  in  the  nega- 
tive direction.  A horizontal  scan  line  j of  the  image  is  selected  and  the  spatio-temporal  area  for  this  scan  line  is 
obtained.  When  the  scan  line  is  coincident  with  the  direction  of  motion  of  the  camera,  tlie  spatio-tempwral  area  is 
called  the  Epipolar  Plane  Image  (EPI).^  The  EPI  of  the  line  j is  shown  in  Figure  \(b).  The  slope  corresponding  to  the 
tree  (closer  object)  is  smaller  than  the  slope  corresponding  to  the  railroad  car  when  the  slope  is  measured  from  the  time 
(vertical)  axis.  Thus,  depending  upon  the  depth  of  the  objects  along  the  scan  line  j,  there  will  be  lines  of  different 
slopes  in  the  EPI.  The  greater  the  depth  of  the  object  from  the  camera,  the  greater  the  slope  measured  from  the  time 
axis  will  be. 
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Figure  1 : (a)  Image  at  time  k;  (b)  EPI  for  scan  line  j. 

The  slope  of  the  edges,  and  hence  the  depth,  can  be  measured  using  local  edge  operators.^  An  algorithm  to  esti- 
mate the  depth  from  optical  flow  determined  using  local  edge  operators  is  presented  in  another  paper  by  us  in  these 
proceedings.*^  Here,  we  use  the  results  of  local  edge  operators  and  explain  how  the  results  can  be  significantly 
improved  using  temporal  integration.  Throughout  our  discussion,  we  use  EPIs  such  as  the  one  shown  in  Figure  1(6), 
and  show  depth  values  at  several  space-time  coordinates. 

To  familiarize  the  readers  with  the  EPI  and  the  terms  we  use  in  our  discussion  consider  the  sample  segment 
shown  in  Figure  1(c),  (also  see  Figure  1(6)).  In  the  figure,  the  x-  axis  (towards  right)  represents  space,  and  the  t-  axis 
(downward)  represents  time.  In  this  example,  the  x and  t axes  are  both  labelled  with  values  from  100  to  114.  The  unit 

of  measure  for  the  x axis  is  pixels . For  the  t axis,  the  unit  is  frame-time , with  one  frame-time  being  equal  to 

of  a second.  Each  horizontal  line  shows  the  intensity  of  the  objects  selected  along  the  scan  line  at  a particular  instant 
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Fugure  1(c):  Intensity  values  for  a segment  of  Figure  1(b) 


of  lime.  For  example,  the  first  horizontal  line  shows  the  intensity  values  at  time  t = 100,  for  x = 100  to  x = 114. 
Since  the  camera  is  moving,  the  intensity  value  at  any  x changes  with  respect  to  / . In  Figure  1(c),  as  t increases,  the 
objects  in  the  scene,  and  hence  the  intensity  values  on  the  image  plane,  move  to  the  left.  For  example,  the  intensity  at 
X = 106  and  t = 108  is  115.  Let  us  write  this  as  /(loe.ios)  = 115.  Since  the  object  is  moving  to  the  left  (relative  to  the 
camera),  at  the  next  instant  of  time  t = 109,  it  will  have  moved  slightly  to  the  left.  In  this  example,  let  us  say  that  it 
will  move  to  (105,109),  with  /(io5,io9)  = 100.  These  two  points  are  called  corresponding  points. 

Since  the  position  of  the  camera  is  continuously  changing,  the  intensity  of  corresponding  points  is  not  the  same. 
However,  since  the  camera  is  looking  sideward  and  moving  horizontally,  the  depth  from  the  camera  for  all  correspond- 
ing points  should  be  the  same.  In  the  discussions  that  follow,  we  show  that  the  depth  results  at  corresponding  points 
obtained  using  local  edge  operators  have  significant  variations.  The  variations  are  so  large  that  it  may  make  the  results 
unusable  for  some  practical  applications.  We  define  temporal  integration  in  the  next  section  and  show  that  it  results  in 
consistent  values  for  optical  flow  computations. 


4.  TEMPORAL  INTEGRATION 

Temporal  integration  is  the  method  of  integrating  corresponding  local  image  value  / obtained  at  consecutive  lime 
instants  over  a period  of  time  to  average  the  quantity  being  integrated.  In  this  application,  the  quantity  being  integrated 
is  the  image  flow  u (which  represents  the  depth  d).  To  integrate  two  successive  frames,  we  do  the  following. 

1.  Compute  u (Xj  ,1^ ). 

2.  Use  u{Xi,tj)  to  predict  the  position  of  the  corresponding  point  {xi^,tj+\)  in  the  next  frame  in  the  EPI. 

3.  Compute  a (xt,ry+i). 

4.  Translate  the  optical  flow  at  (x,  ,ty)  to  (xt,r;+i).  Thus,  u(x,  ,/^)  and  u(Xi,ry+i)  are  available  at  (x*,ry+i)  for  further 
computation. 

5.  Determine  the  average  of  u(x,,ry)  and  u{Xk,tj+\).  This  is  the  integrated  value  at  u(xt,r^+i). 

To  achieve  real-time  results,  we  use  an  incremental  algorithm  for  temporal  integration,  known  as 
the  running  average  method . The  algorithm  for  finding  the  running  average  is  given  below. 

1.  Determine  the  opt/ca/  flow. 

2.  Create  an  image  count  according  to  the  rule: 

FOR  all  pixels,  IF  flow  > 0,  count  = 1 OTHERWISE,  count  = 0. 

3.  Initialize  an  integrated  optical  flow  image  and  an  accumulated  count  image. 

4.  Use  the  integrated  optical  flow  to  determine  the  correspondence  with  the  opnca/  flow. 
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5.  Using  the  correspondence  established,  translate  both  the  integrated  optical  flow  and  the  accumulated  count . 

6.  Obtain  the  updated  count  using 

Updated  count  = count  + Accumulated  count 


7.  Calculate  the  integrated  optical  flow  using 

Integrated  Optical  Flow  - + Integrated  Optical  flow  x Accumulated  count 

updated  count 

Establishing  the  correspondence  between  two  consecutive  frames  is  a difficult  problem.  In  this  work,  we  have 
used  optical  flow  values  less  than  one  pixel  per  frame-time.  This  is  because  larger  optical  flow  values  obtained  using 
gradient  based  methods  are  known  to  induce  large  measurement  errors  due  to  undersampling.^  Since  the  optical  flow, 
or,  the  apparent  motion  of  brightness  patterns  on  the  image  plane  is  less  than  one  pixel  per  frame-time,  instead  of 
determining  the  correspondence  at  every  time  instant,  we  update  the  zth,  (/-t-l)th,  and  (i-l)th  pixels  at  the  next  instant 
of  time,  only  if  a value  for  optical  flow  is  present  at  this  location. 


5.  DEPTH  FROM  OPTICAL  FLOW 

Consider  a sideward-looking  camera  moving  horizontally  with  a velocity  V . In  one  time-instant,  an  object  at  P at 
a distance  d from  the  camera  moves  to  Q.  At  the  same  time,  on  the  image  plane,  the  image  of  the  object  moves  from 
A to  B,  the  distance  being  equal  to  the  optical  flow  u . 

If  / is  the  focal  length  of  the  camera,  using  similar  triangles  OAB  and  OPQ,  we  get: 

f u 

- zIX. 

“ d 

If  the  camera  is  moving  with  a uniform  velocity  V,  then  the 
numerator  of  the  above  equation  is  a constant.  The  equation  can  be 
written  as  follows. 


u = 


-K 


(1) 


In  the  case  of  a horizontally  moving  camera  where  the  motion  is  along  the  x axis,  the  component  of  motion  in 
the  y direction  is  zero.  An  equation  for  the  optical  flow  u is  given  by: 

-I, 


u = 


I. 


Using  Equations  (1)  and  (2),  an  equation  for  the  depth  zi  of  an  object  from  the  camera  can  be  obtained. 

Kh 


d = 


It 


(2) 


(3) 


In  the  discussions  that  follow,  we  have  used  Equation  (3)  to  determine  the  depth  of  objects  from  the  camera.  For 
more  information  on  the  derivation  of  these  equations,  refer  to  the  references  listed  in  this  paper. 


6.  EXPERIMENTS,  RESULTS  AND  DISCUSSION 

A top  view  of  the  setup  for  our  experiments  is  shown  in  Figure  2.  We  use  an  optical  rail  which  has  two  platforms  as 
shown  in  the  figure:  one  translates  horizontally  and  the  other  rotates  about  a vertical  axis.  The  camera  is  mounted  on 
the  platforms  as  shown.  The  accuracy  of  the  velocity  of  translation  is  within  0.02%  of  the  selected  velocity.  Objects 
such  as  0]  and  O2  can  be  placed  in  front  of  the  camera.  In  our  experiments,  we  placed  a single  object  in  front  of  the 
camera  and  obtained  the  EPI  using  a real-time,  high-speed,  image  pjrocessing  machine  called  Pipelined  Image  Process- 
ing Engine  (PIPE).  PIPE  was  conceived  and  designed  at  the  National  Institute  of  Standards  and  Technology  (formerly 
National  Bureau  of  Standards)  by  Kent,  et  al.  ® and  is  commercially  available  through  Aspex  Incorporated.^  It  was 
designed  specifically  for  low  level  vision  tasks  at  very  high  speed.  It  has  8 bit  gray  scale  resolution  and  operates  on 
256x256  images  at  video  rate  (60  fields  per  second).  It  can  also  operate  on  images  of  larger  size  at  lower  rates.  A 
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Figure  2:  Experimental  setup. 

complete  system  can  perform  over  one  billion  ojjerations  per  second.  For  further  details  on  PIPE,  refer  to  the  refer- 
ences listed  in  this  paper.^'*'*^- 

We  have  used  PIPE  to  obtain  the  EPI  at  the  rate  of  30  fields  per  second.  The  EPI  is  used  to  determine  the  depth, 
and  a program  on  the  SUN  computer  is  used  to  temporally  integrate  the  results.  The  results  of  our  experiments  are 
given  below. 

Experiment  #1: 


coo  (!?■>  to 


Figure  3:  Experiment  #1:  (a)  Original  Scene;  (b)  EPI  for  (a);  (c)  optical  flow  for  (b). 

We  placed  an  object  in  front  of  the  camera  as  shown  in  Figure  3(a)  and  obtained  its  EPI  (Figure  3(b)).  We  used 
the  EPI  to  determine  the  depth  using  Equation  (3)  using  a 5x5  local  edge  operator  (see  Figure  3(c)).  The  statistics  for 
the  whole  image  is  given  below.  The  mean  represents  the  average  distance  (in  millimeters)  of  the  object  from  the  cam- 
era determined  using  Equation  (3).  Figure  4 shows  the  depth  values  (in  centimeters)  for  a small  segment  of  the  image. 
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Results  Using  Edge  Operators 
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Fugure  4:  Depth  values  for  a segment  of  the  EPI  in  Figure  3(b). 
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Figure  5:  Integrated  depth  values  for  the  EPI  in  Figure  3(b). 


In  the  segment  of  the  image  shown  in  Figure  4,  all  the  points  are  at  the  same  distance  from  the  camera.  There- 
fore, we  expect  all  points  to  have  the  same  value  for  depth.  However,  in  the  results  obtained  using  local  operators, 
there  is  a large  variation.  In  the  small  segment  shown,  the  range  for  depth  values  is  (138  - 103)  = 35cm,  or  350mm. 
Because  of  this  variation,  the  results  may  not  be  suitable  for  practical  applications  such  as  image  segmentation,  path 
planning,  3-D  reconstruction,  etc.  Figure  5 shows  the  segment  of  the  integrated  image  that  corresponds  to  the  segment 
shown  in  Figure  4.  It  shows  the  results  of  temporal  integration  on  the  depth  values  obtained  from  local  edge  operators. 
The  integration  was  started  at  / = 20,  and  within  8 time-frames,  the  depth  values  become  consistent.  A graph  of  the 
average  depth  versus  time  is  shown  in  Figure  (6)  for  both  segments.  It  shows  how  the  results  converge  in  the  case  of 
temporal  integration,  thus  making  it  more  suitable  for  practical  applications. 


6 


Depth  from  Camera  vs  Frame-Time 
(1  Frame-time =1/30  sec) 


Without  Integration 
With  Integration 


Fi{ure  6:  Graph  of  averaf*  depch  vs  dins 


Fifura  7:  Experiment  #2:  (a)  Orifiiial  Scene;  (b)  EPI  for  (a);  (c)  optical  flow  for  (b). 
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Fufve  ta:  Depth  values  for  a tefment  of  7(b) 
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Fugurt  8b:  Depth  values  for  a aefmen  of  7(b). 
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Fugwe  8c:  Depth  vahiea  for  a segment  of  7(b). 
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Fugure  9a:  Integrated  depth  values  corresponding  to  8a. 
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Fugure  9b:  Integrated  depth  values  conesponding  to  8b. 
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Fugure  9c:  Integrated  depth  values  corresponding  to  8c. 


Experiment  #2: 

This  experiment  shows  that  the  results  of  temporal  integration  can  be  used  successfully  for  practical  applications 
such  as  image  segmentation.  Figure  7(a)  shows  the  object,  which  is  an  inclined  white  board  containing  a number  of 
black  stripes.  The  EPI  for  this  image  is  shown  in  Figure  7(b).  It  can  be  seen  from  the  EPI  that  the  stripes  have 
different  slopes  because  they  are  at  different  distances  from  the  camera.  The  depth  from  camera  is  shown  in  7(c).  Fig- 
ures 8a-8c  show  three  segments  of  the  depth  values  from  7(c).  It  can  be  seen  that  it  may  be  difficult  to  do  segmenta- 
tion of  the  stripes  based  on  the  depth  values  from  simple  local  edge  operators. 

Figures  9a-9c  show  integrated  results  corresponding  to  8a-c  respectively.  It  can  be  seen  that  the  results  of  tem- 
poral integration  converge  within  6-8  frames,  and  thus  can  be  used  to  segment  the  image  7(a)  using  simple  threshold- 
ing. The  integrated  output  can  also  be  used  in  other  practical  applications. 


7.  CONCLUSION 

Image  flow  can  be  used  for  reconstruction  of  3D  scenes,  but  the  results  obtained  using  local  operators  will  be 
very  noisy  and  inaccurate.  A scheme  to  integrate  the  results  over  a period  of  time  will  yield  better  results.  Because  of 
the  volume  of  data  involved  in  image  processing,  incremental  integration  methods  used  for  temporal  integration  offer 
real-time  performance.  The  results  from  temjjoral  integration,  when  compared  to  those  obtained  without  integration, 
show  that  considerable  accuracy  can  be  obtained  using  temporal  integration.  The  individual  depth  values  from 
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segments  of  EPI  show  that  the  output  of  temporal  integration  becomes  consistent  after  integration  over  6-8  frames. 
Experiments  jDerformed  in  our  laboratory  have  shown  that  the  integrated  values  are  consistent  and  can  be  successfully 
used  for  image  segmentation. 
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accuracy  of  both  the  optical  flow  and  the  depth  determined  from  optical  flow. 

In  this  work,  we  describe  an  incremental  integration  scheme  called  the  running  average  method 
to  temporally  integrate  the  image  flow.  We  integrate  the  depth  from  camera  obtained  using 
optical  flow  determined  from  gradient  based  methods,  and  show  that  the  results  of  temporal 
integration  are  much  more  useful  in  practical  applications  than  the  results  from  local  edge 
operators.  Finally,  we  consider  an  image  segmentation  example  and  show  the  advantages  of 
temporal  integration. 
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